Overview

Dataset statistics

Number of variables14
Number of observations262029
Missing cells39896
Missing cells (%)1.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.0 MiB
Average record size in memory112.0 B

Variable types

Categorical4
DateTime1
Numeric9

Warnings

VERSIE has constant value "1.0" Constant
DATUM_BESTAND has constant value "2021-04-16" Constant
PEILDATUM has constant value "2021-04-01" Constant
TYPERENDE_DIAGNOSE_CD has a high cardinality: 1766 distinct values High cardinality
AANTAL_PAT_PER_ZPD is highly correlated with AANTAL_SUBTRAJECT_PER_ZPDHigh correlation
AANTAL_SUBTRAJECT_PER_ZPD is highly correlated with AANTAL_PAT_PER_ZPDHigh correlation
AANTAL_PAT_PER_DIAG is highly correlated with AANTAL_SUBTRAJECT_PER_DIAGHigh correlation
AANTAL_SUBTRAJECT_PER_DIAG is highly correlated with AANTAL_PAT_PER_DIAGHigh correlation
AANTAL_PAT_PER_SPC is highly correlated with AANTAL_SUBTRAJECT_PER_SPCHigh correlation
AANTAL_SUBTRAJECT_PER_SPC is highly correlated with AANTAL_PAT_PER_SPCHigh correlation
PEILDATUM is highly correlated with DATUM_BESTAND and 1 other fieldsHigh correlation
DATUM_BESTAND is highly correlated with PEILDATUM and 1 other fieldsHigh correlation
VERSIE is highly correlated with PEILDATUM and 1 other fieldsHigh correlation
GEMIDDELDE_VERKOOPPRIJS has 39896 (15.2%) missing values Missing
AANTAL_SUBTRAJECT_PER_ZPD is highly skewed (γ1 = 20.88851715) Skewed

Reproduction

Analysis started2021-05-05 16:40:38.867634
Analysis finished2021-05-05 16:41:11.278380
Duration32.41 seconds
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

VERSIE
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
1.0
262029 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters786087
Distinct characters3
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
1.0262029
100.0%
2021-05-05T16:41:11.518036image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-05-05T16:41:11.615752image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
1.0262029
100.0%

Most occurring characters

ValueCountFrequency (%)
1262029
33.3%
.262029
33.3%
0262029
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number524058
66.7%
Other Punctuation262029
33.3%

Most frequent character per category

ValueCountFrequency (%)
1262029
50.0%
0262029
50.0%
ValueCountFrequency (%)
.262029
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common786087
100.0%

Most frequent character per script

ValueCountFrequency (%)
1262029
33.3%
.262029
33.3%
0262029
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII786087
100.0%

Most frequent character per block

ValueCountFrequency (%)
1262029
33.3%
.262029
33.3%
0262029
33.3%

DATUM_BESTAND
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
2021-04-16
262029 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2620290
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-04-16
2nd row2021-04-16
3rd row2021-04-16
4th row2021-04-16
5th row2021-04-16
ValueCountFrequency (%)
2021-04-16262029
100.0%
2021-05-05T16:41:11.866873image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-05-05T16:41:11.969660image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
2021-04-16262029
100.0%

Most occurring characters

ValueCountFrequency (%)
2524058
20.0%
0524058
20.0%
1524058
20.0%
-524058
20.0%
4262029
10.0%
6262029
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2096232
80.0%
Dash Punctuation524058
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
2524058
25.0%
0524058
25.0%
1524058
25.0%
4262029
12.5%
6262029
12.5%
ValueCountFrequency (%)
-524058
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2620290
100.0%

Most frequent character per script

ValueCountFrequency (%)
2524058
20.0%
0524058
20.0%
1524058
20.0%
-524058
20.0%
4262029
10.0%
6262029
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2620290
100.0%

Most frequent character per block

ValueCountFrequency (%)
2524058
20.0%
0524058
20.0%
1524058
20.0%
-524058
20.0%
4262029
10.0%
6262029
10.0%

PEILDATUM
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
2021-04-01
262029 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters2620290
Distinct characters5
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2021-04-01
2nd row2021-04-01
3rd row2021-04-01
4th row2021-04-01
5th row2021-04-01
ValueCountFrequency (%)
2021-04-01262029
100.0%
2021-05-05T16:41:12.221817image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
2021-05-05T16:41:12.320652image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
ValueCountFrequency (%)
2021-04-01262029
100.0%

Most occurring characters

ValueCountFrequency (%)
0786087
30.0%
2524058
20.0%
1524058
20.0%
-524058
20.0%
4262029
 
10.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2096232
80.0%
Dash Punctuation524058
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
0786087
37.5%
2524058
25.0%
1524058
25.0%
4262029
 
12.5%
ValueCountFrequency (%)
-524058
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2620290
100.0%

Most frequent character per script

ValueCountFrequency (%)
0786087
30.0%
2524058
20.0%
1524058
20.0%
-524058
20.0%
4262029
 
10.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII2620290
100.0%

Most frequent character per block

ValueCountFrequency (%)
0786087
30.0%
2524058
20.0%
1524058
20.0%
-524058
20.0%
4262029
 
10.0%

JAAR
Date

Distinct10
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
Minimum2012-01-01 00:00:00
Maximum2021-01-01 00:00:00
2021-05-05T16:41:12.399773image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:12.531710image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)

BEHANDELEND_SPECIALISME_CD
Real number (ℝ≥0)

Distinct27
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean423.0680192
Minimum301
Maximum8418
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:12.703150image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum301
5-th percentile302
Q1305
median313
Q3322
95-th percentile335
Maximum8418
Range8117
Interquartile range (IQR)17

Descriptive statistics

Standard deviation926.6493055
Coefficient of variation (CV)2.190308091
Kurtosis70.30172742
Mean423.0680192
Median Absolute Deviation (MAD)8
Skewness8.496364056
Sum110856090
Variance858678.9355
MonotonicityNot monotonic
2021-05-05T16:41:12.882481image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
30537255
14.2%
31333849
12.9%
30330150
11.5%
33020906
 
8.0%
31617889
 
6.8%
30813615
 
5.2%
30610944
 
4.2%
32410913
 
4.2%
30110541
 
4.0%
3048512
 
3.2%
Other values (17)67455
25.7%
ValueCountFrequency (%)
30110541
 
4.0%
3025689
 
2.2%
30330150
11.5%
3048512
 
3.2%
30537255
14.2%
ValueCountFrequency (%)
84183466
1.3%
1900171
 
0.1%
390673
 
0.3%
3892829
1.1%
3623935
1.5%

TYPERENDE_DIAGNOSE_CD
Categorical

HIGH CARDINALITY

Distinct1766
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size2.0 MiB
101
 
1099
402
 
1071
403
 
1039
301
 
1036
203
 
982
Other values (1761)
256802 

Length

Max length4
Median length3
Mean length3.35074362
Min length2

Characters and Unicode

Total characters877992
Distinct characters25
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st row801
2nd row405
3rd row404
4th row399
5th row403
ValueCountFrequency (%)
1011099
 
0.4%
4021071
 
0.4%
4031039
 
0.4%
3011036
 
0.4%
203982
 
0.4%
201979
 
0.4%
401879
 
0.3%
404865
 
0.3%
802858
 
0.3%
409848
 
0.3%
Other values (1756)252373
96.3%
2021-05-05T16:41:13.328958image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
1011099
 
0.4%
4021071
 
0.4%
4031039
 
0.4%
3011036
 
0.4%
203982
 
0.4%
201979
 
0.4%
401879
 
0.3%
404865
 
0.3%
802858
 
0.3%
409848
 
0.3%
Other values (1756)252373
96.3%

Most occurring characters

ValueCountFrequency (%)
1168223
19.2%
0160507
18.3%
2116339
13.3%
395299
10.9%
567325
7.7%
963507
 
7.2%
462561
 
7.1%
751682
 
5.9%
646002
 
5.2%
837696
 
4.3%
Other values (15)8851
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number869141
99.0%
Uppercase Letter8851
 
1.0%

Most frequent character per category

ValueCountFrequency (%)
G1644
18.6%
M1477
16.7%
B1059
12.0%
E763
8.6%
Z694
7.8%
D606
 
6.8%
A580
 
6.6%
F564
 
6.4%
C296
 
3.3%
K284
 
3.2%
Other values (5)884
10.0%
ValueCountFrequency (%)
1168223
19.4%
0160507
18.5%
2116339
13.4%
395299
11.0%
567325
7.7%
963507
 
7.3%
462561
 
7.2%
751682
 
5.9%
646002
 
5.3%
837696
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common869141
99.0%
Latin8851
 
1.0%

Most frequent character per script

ValueCountFrequency (%)
G1644
18.6%
M1477
16.7%
B1059
12.0%
E763
8.6%
Z694
7.8%
D606
 
6.8%
A580
 
6.6%
F564
 
6.4%
C296
 
3.3%
K284
 
3.2%
Other values (5)884
10.0%
ValueCountFrequency (%)
1168223
19.4%
0160507
18.5%
2116339
13.4%
395299
11.0%
567325
7.7%
963507
 
7.3%
462561
 
7.2%
751682
 
5.9%
646002
 
5.3%
837696
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII877992
100.0%

Most frequent character per block

ValueCountFrequency (%)
1168223
19.2%
0160507
18.3%
2116339
13.3%
395299
10.9%
567325
7.7%
963507
 
7.2%
462561
 
7.1%
751682
 
5.9%
646002
 
5.2%
837696
 
4.3%
Other values (15)8851
 
1.0%

ZORGPRODUCT_CD
Real number (ℝ≥0)

Distinct5895
Distinct (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean442145006.7
Minimum10501002
Maximum998418081
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:13.666805image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum10501002
5-th percentile28999037
Q199799063
median149899002
Q3990004004
95-th percentile990416053
Maximum998418081
Range987917079
Interquartile range (IQR)890204941

Descriptive statistics

Standard deviation429377388.5
Coefficient of variation (CV)0.9711234595
Kurtosis-1.743637859
Mean442145006.7
Median Absolute Deviation (MAD)119999995
Skewness0.4610681944
Sum1.15854814 × 1014
Variance1.843649417 × 1017
MonotonicityNot monotonic
2021-05-05T16:41:13.887841image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
9900040091898
 
0.7%
9900030041879
 
0.7%
9900040071871
 
0.7%
9900040061518
 
0.6%
9903560761343
 
0.5%
9903560731246
 
0.5%
9900030071204
 
0.5%
1319992281141
 
0.4%
1319991641127
 
0.4%
1992990131087
 
0.4%
Other values (5885)247715
94.5%
ValueCountFrequency (%)
105010026
< 0.1%
105010039
< 0.1%
1050100410
< 0.1%
105010059
< 0.1%
105010073
 
< 0.1%
ValueCountFrequency (%)
998418081128
< 0.1%
998418080115
< 0.1%
99841807934
 
< 0.1%
9984180777
 
< 0.1%
9984180766
 
< 0.1%

AANTAL_PAT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9080
Distinct (%)3.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean513.6010442
Minimum1
Maximum156439
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:14.110785image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median14
Q3104
95-th percentile1744
Maximum156439
Range156438
Interquartile range (IQR)101

Descriptive statistics

Standard deviation3149.337782
Coefficient of variation (CV)6.131875739
Kurtosis384.8124431
Mean513.6010442
Median Absolute Deviation (MAD)13
Skewness16.34197142
Sum134578368
Variance9918328.467
MonotonicityNot monotonic
2021-05-05T16:41:14.327136image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
143132
 
16.5%
221126
 
8.1%
313724
 
5.2%
410247
 
3.9%
57897
 
3.0%
66650
 
2.5%
75556
 
2.1%
84702
 
1.8%
94355
 
1.7%
103801
 
1.5%
Other values (9070)140839
53.7%
ValueCountFrequency (%)
143132
16.5%
221126
8.1%
313724
 
5.2%
410247
 
3.9%
57897
 
3.0%
ValueCountFrequency (%)
1564391
< 0.1%
1548211
< 0.1%
1538821
< 0.1%
1447011
< 0.1%
1143701
< 0.1%

AANTAL_SUBTRAJECT_PER_ZPD
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED

Distinct9706
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean601.1570971
Minimum1
Maximum239907
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:14.548303image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median15
Q3114
95-th percentile1981
Maximum239907
Range239906
Interquartile range (IQR)111

Descriptive statistics

Standard deviation3992.932227
Coefficient of variation (CV)6.642077831
Kurtosis698.4977423
Mean601.1570971
Median Absolute Deviation (MAD)14
Skewness20.88851715
Sum157520593
Variance15943507.77
MonotonicityNot monotonic
2021-05-05T16:41:14.767885image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
141564
 
15.9%
220781
 
7.9%
313589
 
5.2%
410064
 
3.8%
57822
 
3.0%
66631
 
2.5%
75525
 
2.1%
84636
 
1.8%
94261
 
1.6%
103872
 
1.5%
Other values (9696)143284
54.7%
ValueCountFrequency (%)
141564
15.9%
220781
7.9%
313589
 
5.2%
410064
 
3.8%
57822
 
3.0%
ValueCountFrequency (%)
2399071
< 0.1%
2324841
< 0.1%
2313171
< 0.1%
2276581
< 0.1%
2213911
< 0.1%

AANTAL_PAT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7926
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7743.301169
Minimum1
Maximum216998
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:14.996965image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile43
Q1422
median1744
Q36534
95-th percentile36952
Maximum216998
Range216997
Interquartile range (IQR)6112

Descriptive statistics

Standard deviation17775.54068
Coefficient of variation (CV)2.295602392
Kurtosis32.39185591
Mean7743.301169
Median Absolute Deviation (MAD)1580
Skewness4.959241839
Sum2028969462
Variance315969846.5
MonotonicityNot monotonic
2021-05-05T16:41:15.211512image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21429
 
0.2%
14388
 
0.1%
17375
 
0.1%
19359
 
0.1%
23358
 
0.1%
25356
 
0.1%
37350
 
0.1%
33347
 
0.1%
15345
 
0.1%
26342
 
0.1%
Other values (7916)258380
98.6%
ValueCountFrequency (%)
1330
0.1%
2308
0.1%
3301
0.1%
4318
0.1%
5285
0.1%
ValueCountFrequency (%)
21699823
< 0.1%
21213325
< 0.1%
20981819
< 0.1%
20837217
< 0.1%
20423217
< 0.1%

AANTAL_SUBTRAJECT_PER_DIAG
Real number (ℝ≥0)

HIGH CORRELATION

Distinct8815
Distinct (%)3.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10954.70991
Minimum1
Maximum347719
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:15.423759image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile55
Q1552
median2407
Q39063
95-th percentile51707
Maximum347719
Range347718
Interquartile range (IQR)8511

Descriptive statistics

Standard deviation25947.3662
Coefficient of variation (CV)2.368603681
Kurtosis36.44294955
Mean10954.70991
Median Absolute Deviation (MAD)2203
Skewness5.225350286
Sum2870451682
Variance673265812.9
MonotonicityNot monotonic
2021-05-05T16:41:15.634828image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
25316
 
0.1%
32302
 
0.1%
18297
 
0.1%
17294
 
0.1%
38290
 
0.1%
11285
 
0.1%
19284
 
0.1%
34284
 
0.1%
1279
 
0.1%
22278
 
0.1%
Other values (8805)259120
98.9%
ValueCountFrequency (%)
1279
0.1%
2244
0.1%
3260
0.1%
4254
0.1%
5227
0.1%
ValueCountFrequency (%)
34771923
< 0.1%
34566925
< 0.1%
34052019
< 0.1%
32370820
< 0.1%
30576817
< 0.1%

AANTAL_PAT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct258
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean677488.3305
Minimum3
Maximum1489511
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:15.860010image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile46186
Q1292896
median748499
Q3995511
95-th percentile1345267
Maximum1489511
Range1489508
Interquartile range (IQR)702615

Descriptive statistics

Standard deviation411806.1862
Coefficient of variation (CV)0.6078424788
Kurtosis-1.05883127
Mean677488.3305
Median Absolute Deviation (MAD)306271
Skewness0.0118735588
Sum1.775215898 × 1011
Variance1.69584335 × 1011
MonotonicityNot monotonic
2021-05-05T16:41:16.082783image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8809675102
 
1.9%
8742594354
 
1.7%
8439964348
 
1.7%
8930384332
 
1.7%
8749794271
 
1.6%
8522954166
 
1.6%
10842053891
 
1.5%
10637623851
 
1.5%
10775343847
 
1.5%
10456113820
 
1.5%
Other values (248)220047
84.0%
ValueCountFrequency (%)
33
 
< 0.1%
45
 
< 0.1%
51
 
< 0.1%
62
 
< 0.1%
1021
< 0.1%
ValueCountFrequency (%)
14895112976
1.1%
14506323054
1.2%
14218643564
1.4%
13452673543
1.4%
13329183546
1.4%

AANTAL_SUBTRAJECT_PER_SPC
Real number (ℝ≥0)

HIGH CORRELATION

Distinct258
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1075098.901
Minimum3
Maximum2582911
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:16.321172image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile49553
Q1483156
median1088467
Q31729116
95-th percentile2488921
Maximum2582911
Range2582908
Interquartile range (IQR)1245960

Descriptive statistics

Standard deviation714274.2511
Coefficient of variation (CV)0.6643800405
Kurtosis-0.8748013266
Mean1075098.901
Median Absolute Deviation (MAD)631527
Skewness0.2984517275
Sum2.817070898 × 1011
Variance5.101877057 × 1011
MonotonicityNot monotonic
2021-05-05T16:41:16.690129image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
12118135102
 
1.9%
12815494354
 
1.7%
12162844348
 
1.7%
13129934332
 
1.7%
12901874271
 
1.6%
12631654166
 
1.6%
25571693891
 
1.5%
24890273851
 
1.5%
25829113847
 
1.5%
24889213820
 
1.5%
Other values (248)220047
84.0%
ValueCountFrequency (%)
33
 
< 0.1%
45
 
< 0.1%
51
 
< 0.1%
62
 
< 0.1%
1021
< 0.1%
ValueCountFrequency (%)
25829113847
1.5%
25571693891
1.5%
24890273851
1.5%
24889213820
1.5%
21846633757
1.4%

GEMIDDELDE_VERKOOPPRIJS
Real number (ℝ≥0)

MISSING

Distinct3182
Distinct (%)1.4%
Missing39896
Missing (%)15.2%
Infinite0
Infinite (%)0.0%
Mean3522.892952
Minimum20
Maximum287220
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.0 MiB
2021-05-05T16:41:16.926277image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile140
Q1465
median1245
Q34080
95-th percentile13290
Maximum287220
Range287200
Interquartile range (IQR)3615

Descriptive statistics

Standard deviation6596.672459
Coefficient of variation (CV)1.872515728
Kurtosis167.6851786
Mean3522.892952
Median Absolute Deviation (MAD)1010
Skewness7.813549912
Sum782550780
Variance43516087.53
MonotonicityNot monotonic
2021-05-05T16:41:17.148176image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1051855
 
0.7%
1601831
 
0.7%
1101465
 
0.6%
1801382
 
0.5%
1201326
 
0.5%
1851306
 
0.5%
3001267
 
0.5%
1451265
 
0.5%
1401211
 
0.5%
5001159
 
0.4%
Other values (3172)208066
79.4%
(Missing)39896
 
15.2%
ValueCountFrequency (%)
201
 
< 0.1%
70226
 
0.1%
7575
 
< 0.1%
80361
 
0.1%
85929
0.4%
ValueCountFrequency (%)
2872208
< 0.1%
1489103
 
< 0.1%
1428554
< 0.1%
1221554
< 0.1%
1167653
 
< 0.1%

Interactions

2021-05-05T16:40:52.499906image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:52.751373image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:52.993217image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:53.225246image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:53.452477image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:53.696943image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:53.935033image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:54.175245image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:54.415836image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:54.653622image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:54.882839image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:55.098204image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:55.324484image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:55.567579image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:55.801643image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:56.028212image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:56.252896image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:56.503693image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:56.735358image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:56.985113image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:57.208388image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:57.456002image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:57.699836image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:57.923087image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:58.154171image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:58.380411image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:58.587223image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:58.969818image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:59.178810image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:59.393932image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:59.612045image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:40:59.831007image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:00.039223image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:00.256212image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:00.457953image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:00.684720image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:00.893848image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:01.101136image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:01.311512image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:01.528996image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:01.755316image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:01.993325image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:02.213470image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:02.454865image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:02.691629image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:02.918473image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:03.149426image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:03.368873image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:03.589563image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:03.967010image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:04.187917image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:04.420212image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:04.649192image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:04.884549image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:05.109743image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:05.331017image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:05.555850image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:05.788830image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:05.996372image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:06.209537image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:06.428646image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:06.640273image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:06.859020image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:07.075622image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:07.296366image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:07.513663image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:07.726719image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:07.951948image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:08.165071image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:08.377964image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:08.599151image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
2021-05-05T16:41:08.946551image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Correlations

2021-05-05T16:41:17.351092image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-05-05T16:41:17.662471image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-05-05T16:41:17.976578image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-05-05T16:41:18.279416image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-05-05T16:41:18.539483image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-05-05T16:41:09.297531image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
A simple visualization of nullity by column.
2021-05-05T16:41:10.034634image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-05-05T16:41:10.925358image/svg+xmlMatplotlib v3.4.1, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
01.02021-04-162021-04-012012-01-013248011319992062192359411050237383280015285.0
11.02021-04-162021-04-012012-01-013244051319992071681681162313281237383280015600.0
21.02021-04-162021-04-012012-01-01324404131999118449081026237383280015825.0
31.02021-04-162021-04-012012-01-013243991319990103939212224852373832800152230.0
41.02021-04-162021-04-012012-01-0132440313199908488177419632373832800151110.0
51.02021-04-162021-04-012012-01-01324711131999206993943237383280015285.0
61.02021-04-162021-04-012012-01-013243161319992082424127153237383280015400.0
71.02021-04-162021-04-012012-01-01324604131999207114858237383280015600.0
81.02021-04-162021-04-012012-01-0132430913199915411944311467237383280015NaN
91.02021-04-162021-04-012012-01-013241091319992084747285347237383280015400.0

Last rows

VERSIEDATUM_BESTANDPEILDATUMJAARBEHANDELEND_SPECIALISME_CDTYPERENDE_DIAGNOSE_CDZORGPRODUCT_CDAANTAL_PAT_PER_ZPDAANTAL_SUBTRAJECT_PER_ZPDAANTAL_PAT_PER_DIAGAANTAL_SUBTRAJECT_PER_DIAGAANTAL_PAT_PER_SPCAANTAL_SUBTRAJECT_PER_SPCGEMIDDELDE_VERKOOPPRIJS
2620191.02021-04-162021-04-012018-01-013270118990027140449301533199707370698NaN
2620201.02021-04-162021-04-012018-01-0132705129900272091122914416199707370698NaN
2620211.02021-04-162021-04-012018-01-013270216990027144101032106204199707370698NaN
2620221.02021-04-162021-04-012018-01-0132701159900271365151156182558719970737069826050.0
2620231.02021-04-162021-04-012018-01-013270516990027200404613422944199707370698NaN
2620241.02021-04-162021-04-012018-01-01327031499002716018202409517298581997073706983160.0
2620251.02021-04-162021-04-012018-01-013270216990027135113210620419970737069840545.0
2620261.02021-04-162021-04-012018-01-0132707159900271882020764112851199707370698NaN
2620271.02021-04-162021-04-012018-01-013270315990027159118140107520611997073706989500.0
2620281.02021-04-162021-04-012018-01-01327041599002716422222444427419970737069820170.0